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Abstract 

We study the problem of parameters estimation in Indirect Observability contexts, where 
Xt G iT is an unobservable stationary process parametrized by a vector of unknown parameters 
and all observable data are generated by an approximating process Yf which is close to Xt 
in norm. We construct consistent parameter estimators which are smooth functions of 
the sub-sampled empirical mean and empirical lagged covariance matrices computed from the 
observable data. We derive explicit optimal sub-sampling schemes specifying the best paired 
choices of sub-sampling time-step and number of observations. We show that these choices 
ensure that our parameter estimators reach optimized asymptotic L^-convergence rates, which 
are constant multiples of the norm IIT/ — At||. 

Keywords: Parametric Estimation, Non-Gaussian Diffusions, Empirical covariance estimators. 
Indirect observability 


1 Introduction 

The amount of available observational data has increased massively in recent years due to rapid 
technological advances in science and engineering. Often, it is desirable to fit an appropriate 
parametrized stochastic model to the available data and then use this model for forecasting, analy¬ 
sis, etc. Stochastic processes Xt driven by systems of stochastic differential equations (SDEs) have 
often been used for this purpose so that both parametric and non-parametric techniques for fitting 
SDEs to the available data have a rich history (see [8l l39[R6| l61j for a general overview). Non- 
parametric approaches for SDE data modeling have used Bayesian methods as in [271[53l[5llE9l|68] , 
exploited the spectral properties of the infinitesimal generator [20l[2Tl|37] , or have developed max¬ 
imum likelihood function estimation as in [4411451165] . as well as drift and diffusion estimates by 
conditional expectations of process dynamics over short time intervals [HlIIlllIlllIllMlETI, with 
potential use of kernel based techniques as in [101169] . Eor parametrized SDE models, various mo¬ 
ments based parameter estimators (see [32] and references therein) have been implemented, as well 
as approximate maximum-likelihood parameters estimators after time discretization of the SDEs 
(see for instance [HISIIITIISH]). Minimum-contrast estimators have also been used for parametric 
estimation of diffusions [221135] . The asymptotic consistency and efficiency of SDEs model fitting to 
data have been well analyzed in the literature, although computational issues remain key questions 
in high dimensional applications. 
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In this paper we address the problem of parameter estimation for multi-dimensional diffusions 
in a specific context, namely in situations when the process Xt to be modeled is not directly 
observable (i.e. Xt is unobservable). Instead, the observed data are generated by an approximating 
process Yf which involves a small “scaling” parameter e, and the SDEs driving Xt are discovered 
by asymptotic analysis as e ^ 0. Moreover, often the precise dynamics of the process Yf is not 
known or too complex to be explicitly formalized. In such Indirect Observability contexts, where 
the SDE system driving the unobservable Xt is parametrized by an unknown vector 0, but the 
observable Yf is generated by unknown dynamics approximating the behavior of Xt, a natural goal 
is to efficiently estimate 0 from the observable data Yf, assuming that for some adequate norm 
Yf ^ Xt as e ^ 0. The main mathematical goal in this indirect observability context is to estimate 
the underlying SDE parameters by classes of estimators depending only on e and on the observable 
data Yf] of course as e ^ 0, one wants these estimators to be consistent and approximately efficient. 
Recent references for consistent estimation of parametrized diffusions under indirect observability 
include [ni^[7ll2ni24ll25ll35ll55ll56] . In financial applications, indirect observability situations emerge 
as soon as dynamic models involve the (unobservable) volatilities of assets, and their replacement 
by observable approximations of volatilities, such as ‘’realized volatilities” computed from stock 
prices (see, e.g., [nHl3l[l5l|l8l|3l]). Indirect observability is also a natural context for stock prices 
dynamics based on precise noise microstructure (see, e.g. mm)- 

Our indirect observability framework covers a broad range of SDE systems driving unobserv¬ 
able multidimensional processes Xt, where the drift and diffusion terms are fairly generic smooth 
functions of an unknown parameter vector 6 . Such SDEs are widely used in engineering and au¬ 
tomatic control, population evolution, atmospheric and ocean dynamics, stock prices dynamics, 
options pricing, etc. to approximate the behavior of the leading variables of interest. To construct 
consistent estimators 0 of 0 based on observed trajectories Yjq of the approximate data, a natural 
approach is to first derive “ideal” estimators as specific functionals 4>{X[Q^t]) of the unobservable tra¬ 
jectory X[o^ 4 ], and then prove that under adequate conditions —)• 0 as e —>■ 0 and N{e) —>■ oo, 

where N{e) is the size of the observational data sample. In [4l|6l[7], we had combined this approach 
with data sub-sampling to generate consistent estimators 0 based on approximate observable data 
Yf with fairly generic joint distributions, but for Gaussian limiting processes Xt, and we had also 
determined how nearly optimal sub-sampling rates should depend on e. In this paper we extend our 
indirect observability analysis to stationary processes Xt such that Yf —)• Xt in as e —0, but 
with weak restrictions on the joint distributions of Xt and Yf, which can both be non-Gaussian. 

Discretized sub-sampling is standard to collect data from a continuous process. The observable 
data are then of the form Y^^, where the observational time-step 5 is determined by data acquisition 
protocols for sensors recordings or by the computational time-step for observables generated by 
numerical PDE models. As is well known, adequate data sub-sampling can reduce computational 
overhead without sacrificing estimators accuracy. So we introduce a user selected subsampling 
time-step A(e), which will be a multiple of 5 and will fix the observational sample n = 

1,... ,X(e)} retained to estimate 0. We then adress the key issue of how to optimize parameter 
estimators performance by seeking the “best” asymptotic choices for the number of observations 
N{e) and the sub-sampling time-step A(e). We thus derive explicit relations between A(e), N{s), 
and p{£) = \\Y^ — Xt ||4 ensuring nearly optimal behavior for parameter estimation errors as e ^ 0. 
This is achieved by focusing on estimators which are (not necessarily explicit) smooth functions 
of empirical lagged moments up to order two computed from the X(e) sub-sampled observables 
i'^nA{e)^ n = 1,..., X(e)}. Since in practice the available numbers of observations remain moderate, 
our nearly optimal choices for the pair (X(e),A(e)) are constructed to simultaneously minimize 
the size of parameter estimation errors and the computational/observational complexity. 
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The paper is organized as follows. Basic assumptions about our indirect observability setup are 
given in section [2l Our main results about speed of convergence for parameters estimators based on 
observable data and the associated characterization of optimal sub-sampling schemes are presented 
in section [H In section [3] we outline the main class of parameter estimators studied here, namely 
smooth functions of lagged moments of order < 2. Section [J] contains key technical results on 
consistency of moments estimators computed from unobservable data. Potential applications of 
our results to stationary multi-dimensional diffusions and to Heston SDEs are discussed in the two 
sections 12.21 and [H 

2 Indirect Observability Setup 

2.1 Basic Indirect Observability Hypotheses 

Notations. For any matrix M = (Mij), we set ||M|| = supj jMjj, and M* is the transpose of 
M. The LP norm of a random matrix M is denoted by ||M||p = IE(||M||^)^/^’. 

Indirect Observability. Our formal indirect observability setup Xt = lime_j.o involves 

- a set of directly observable continuous time stochastic processes Yf € indexed by a small 
“scale” parameter e > 0, with norms ||b ^||4 uniformly bounded for all e > 0 and t > 0, 

- an unobservable strictly stationary process Xt = lime_).ob:^, where convergence holds at uniform 
L'^-speed p{s) —)• 0 as e ^ 0, so that 

\\Yt — Xt\\i < p{e), for alH > 0, e > 0. (1) 


Moreover, we will assume that 

- the mean p of Xt and the lagged covariance matrices K{u) = E{XtXt_^_u) — pp* are functions 
of an unknown parameter vector 0 E 0, with 0 open in 

- as functions of the time lag u, the matrices K{u) are locally Lipschitz, uniformly in 6 . 

Since Xt is not directly observable, our main goal here is to construct observable estimators 6^ 
of 0 , i.e. estimators depending only on the observable data Yt ^ and hence of the form 


0 " 





( 2 ) 


where the number q of observables and the instants ti,... ,tq depend on e, and each Borel function 
: M”? —)• is deterministic. As e —>■ 0, achieving consistency in probability for the estimators 
0^ will require to specify nearly optimal choices for q{e) and for the time grid ti(e),... ,tq{£), and 
to clarify how these choices depend on the approximation speed p{£). 


2.2 Multi-dimensional Diffusions under Indirect Observability 

Stationary multi-dimensional diffusions. Many practical examples of indirect observability 
are linked to the approximation of high dimensional multiscale systems by reduced stochastic 
differential equations obtained by averaging and homogenization For instance 

this is the case of homogenization applications to the atmosphere-ocean science [26ll28l[29l[48l|49] . 
In those contexts the fully explicit mathematical dynamics for the observable Yf is typically too 
complex to be fully modeled. However, the unobservable limit process Xt E M” is often modeled 
by a relatively low-dimensional strictly stationary multi-dimensional diffusion driven by an SDE 
system with coefficients depending smoothly on the parameter vector 0 E 0 C These SDE 
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systems, derived by an ad-hoc analysis of the main “mechanisms” generating the observables, 
are of the form 

dXt = b{Xt, e)dt + a{Xt, e)dWt, (3) 

where 

- Wt is a multi-dimensional Brownian motion, 

- the drift b{x, 6) and the matrix (t(x, 6) are functions of x G M” and 6 G M^, 

- the matrix a(x, 0 ) = is invertible for all x and 6 . 

The transition density pe{t,x,y) of X^ then depends smoothly on 6 and is the fundamental 
solution of a parabolic PDE with coefficients and b. The literature (see for instance [30lH0l[T7] l 
has extensively discussed these Fokker-Plank PDEs, as well as the elliptic PDE verified by the 
stationary density qe{x,y) = limt_,.ooP 0 (t,x,?/), when Xt is strictly stationary. In dimension 1, 
strict stationarity of Xt is equivalent to integrability in x G M of p{x) = and (see [i3] f 

the stationary density of Xt is then proportional to p{x). In higher dimensions, the literature does 
not seem to provide easy to use sufficient conditions for stationarity, so that stationarity has to be 
verified in each specific application. 

Parameter estimation for diffusions. The abundant literature on parameter estimation for 
multi-dimensional diffusions such as Xt (see overviews |39 ( l46 l l6T| and references therein) offers a 
broad range of parameter estimators which are non explicit but numerically computable functions 

6 = g,{Xt„...,Xt^) (4) 

of q diffusion “data” Xt ^, ■ ■ ■, Xt^. Estimators of this type can for instance be numerically derived 
by approximate maximum likelihood after time discretization (see, e.g., [n[^ [27l[33l[3^l46l[58l[6T] 1 . 

Under variously formulated sufficient conditions on the diffusion, maximum likelihood estima¬ 
tors of 6 become asymptotically consistent for q sufficiently large and for dense enough specific 
time grids However in our indirect observabilty setup, these estimators are obviously 

not “observable”, since they involve the non-observable diffusion data Xt ^, ■ ■ ■, Xt^. When “good” 
choices are known for the pq functions in Q, one can naturally attempt to construct consistent 
observable estimators 0 ^ by setting 


e^ = gq{Y,\,...,Y,l), 

with “adequate” choices for the number of observations q{e) and for the time grid ti{s),... ,tq(e). 
A key technical goal in our paper is to “optimally” select q{e) and this time grid as functions of e, 
and to link such nearly optimal sub-sampling schemes to the L^-speed of approximation p{e) of Xt 
by the observable Yf. 

Sparsely parametrized stationary diffusions. For “uniformly elliptic” stationary diffusions 
Xt driven by equation ([3]), the matrix a, its inverse a~^, and the drift b are classically assumed to 
be uniformly bounded for x G M” and 6 G The well known Aronson bounds of the diffusion 
transition density pg{t,x,y) (see [3]) then imply the finiteness of all lagged moments of arbitrary 
order for Xt , and due to the smoothness of pe , all these lagged moments are then necessarily smooth 
functions of 6. 

Conjecture. On the basis of multiple concrete examples of stationary diffusions, we conjecture 
that when the matrices of diffusion coefficients a and b in equation ([3|) are analytic functions of the 
p-dimensional parameter vector 0, one can then find a vector T = [Ti,..., Tp] of p lagged moments 
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of order < 2 of which uniquely determine 0 as a (non explicit) smooth function 6 = G('h) of 
For such “sparsely parametrized” diffusions, parameter estimators based on estimation of lagged 
moments become natural targets, as outlined in the next section. 


3 Parameter estimators based on lagged moments of order < 2 

3.1 Sparsely parametrized stationary processes 

The preceding conjecture on multidimensional stationary diffusions leads us to introduce the fol¬ 
lowing definition. 

Definition 1. We say that a strictly stationary process Xt is sparsely parametrized hy d ^ W’ 
whenever one can find a vector T = [Ti,..., Tp] of p lagged moments of order < 2 of Xt, which 
uniquely determine 6 as a smooth function 6 = G(l') o/T. 

Let Xt be a sparsely parametrized stationary process embedded in an indirect observability setup 
Xt = . As e —>■ 0, constructing consistent observable estimators for all lagged moments 

of order < 2 of Xt is then clearly equivalent to constructing consistent observable estimators 6 for 
6 . Indeed if 6 = G(T), one can simply set 9 = G('F^), where is an observable estimator of the 
vector of moments T. Observable estimators for the lagged 2nd order moments Tj are naturally 
provided by the empirical lagged covariances of Yf, and hence consistent observable estimators of 
6 can then be constructed as smooth functions of empirical lagged moments of order < 2 of the 
observable data Yf ■ 


3.2 Sub-sampled Empirical Moments 


As pointed out in many papers on indirect observability (see, e.g., HUSH 1311551156]), correctly 
scaled sub-sampling of observable data can reduce computational overhead and decrease the bias of 
parameter estimators. A main technical point here will then be to select nearly optimal functions of 
e for the sub-sampling time step A(e) of observable data as well as for the overall number N{e) of 
sub-sampled observables. So we now define adaptive sub-sampling schemes for empirical estimators 
of lagged covariances. 

Definition 2. Adaptive sub-sampling schemes will be specified by two functions of e: the sub- 
sampling time step A = A(e) > 0 and the number N = N(e) of sub-sampled observations, and we 
will impose the natural conditions 


A(e) —)■ 0 and X(e)A(e) ^ oo as e ^ 0. (5) 

Fix an adaptive sub-sampling scheme A(e),X(e). Each time lag u will then be approximated 


u 


by kA with the integer k given by 

n = k{u,£) = , 

Note that K(0,e) = 0, and that (l5|) implies 


= closest integer to 


u 


A{ey 


( 6 ) 


K(u,e) 

hmK(u,e) = oo and hm —— -5-0. 

£^o ^ ^ £-)-o N{e) 

We estimate the mean p, of Xt by the (observable) empirical mean 

1 ^ 
n=l 


(7) 
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Definition 3. For any given time lag u, let n = 
shifted empirieal mean TyY^ by 

N 


TuY^ = 

n=l 


k{u,s) be as in equation (] 6 ]). 

^ (n+K)A' 


Define the time- 


( 8 ) 


We then estimate the lagged covariance matrix K{u) of Xt by the observable sub-sampled empirical 
covariances 


Kp{u) = 


N 


N 


E ^nAiYl. 


n-\-K)A) 


_ n=l 


-Y%TuYr, 


(9) 


where N = N{e), A = A(e), k = K{u,e) 

We always assume that e is small enough to force N ^ k. 


3.3 Sensitivity of Observable Covariances Estimators to Data Approximation 

In formulas ([7|) and Q, replacing all observable Yf by their limits Xt transforms the observable 
estimators Y^, TuY^ and Kyiu) into unobservable estimators X^, TuX^ and K^{u) given by 

1 ^ 

= (10) 

n=l 

1 ^ 

'^uX — ^ ^ ^(n+K)A^ ( 11 ) 

n=l 


and 


mu) = 


N 


N 


E 


(n+K)A 


_ n=l 


-x^t.,x^y = 


1 ^ 


„i - .?') - T„V)*. (12) 


n=l 


where, as before, N = A(e),A = A(e),«: = K{u,e). Obviously, the covariance estimator remain 
unchanged when Xt is replaced by the centered process Xt — //. We now evaluate the L^-norm per¬ 
turbations induced on W and Kyiu) when the observable data are replaced by their unobservable 
limits. 


Proposition 1. Consider any indireet observability setup Xt = lime_j.o Yfi as in section [3 Let p[e) 
be the -speed of convergence of Yfi to Xt. Let v be an upper bound for all the L^ norms ||At ||4 
and 111^/114. Fix an adaptive sub-sampling scheme A{s),N{e) verifying ([5]). Then as s ^ 0, the 
observable estimators Y^ and Ky{u) defined by ([7]) and Q and their unobservable versions 
and Kj^{u) verify 


\\Ky{u) — Kx{u )\\2 < ^vp{e) for all n > 0, (13) 

\\Y^ — X'^\\i < p{£) and \\tuY^ — TuX ^\\4 < p{£). (14) 

Proposition [1] reduces the L^-consistency analysis for our observable estimators of lagged mo¬ 
ments to L^-consistency analysis for the unobservable moments estimators based on the Xt data 
and given by ([l0|), ([U]), ([H]) 
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Proof. We extend a similar proof given in [3j. The uniform bounds on ||y^^ — WIU; ||^t|| 4 , 
are preserved by convex linear combinations, which yields 

||F"||4<z^, \\X^\U<i^, \\Y^ -X^\U<p{e), \\tuY^ -TuX^\U<p{e). (15) 

Note that for any two random column vectors A,B ^ R”*, one has the elementary inequalities 

\\AB *\\2 <\\A\\4B\\i and \\A*B^ < m\\A\\4\\B\\4 (16) 

Let V, W, V, W be four random column vectors in M™, with L'^-norms inferior to v, and verifying 

\\V-V'\\ 4 <p and ||W-W '||4 < p. (17) 

By (fT6]l . the norms ||17(1T — 1T')*||2 and ||(17 — W)(1T')*||2 are resp. inferior to ||17||4||1T — 1T'||4 
and ||hL'||4||17 — W|| 4 , so that 

\\V{W -W')*\\ 2 <vp and \\{V -V'){W ')*\\2 < vp. 

Hence, the random matrices VW* and V'{W')* verify 

IIHW* - V'{W ')*\\2 = \\V{W - W'f + {V - V'){W'Y \\2 < 2vp. (18) 

The bound (fT8]l can be applied when V,W,V',W' are replaced by Y^^ ,TuY^, X^^ ,TuX‘^ , and this 
yields 


||y^(r„y^)* - XYtuXYY \2 < 2up{e). 

The bound (I18p and relation (1141) similarly show that 

\\ynA{Y[n+.)Ar " ^nAX(„+«)All2 < 2up{e) 
for all n, A, n. Convex combinations preserve this uniform L 2 -bound, and hence 

1 ^ 

^ E \YnA{yUn)Ar - XnAXl^,)^\ < 2up{e). 

n=l 


By definitions of Kyiu) and Kj^{u), the inequalities (fT9]l and ([2T]) conclude the proof. 


(19) 


( 20 ) 


( 21 ) 

□ 


4 L^-consistency of Unobservable Sub-sampled Moments Estima¬ 
tors 

Since proposition [1] effectively controls in L^-norm the difference between observable and unobserv¬ 
able estimators of lagged covariances, we now focus on the consistency of sub-sampled empirical 
covariance estimators based on the unobservable Xt- This will require assuming a fast enough 
asymptotic decorrelation speed for the process Xt. 
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4.1 Integrable Decorrelation Rates for 4th Order Moments 

Definition 4. Let Xf G M'" 6e a stationary process with uniformly bounded moments of order f. 
Denote by Xt{i), i = 1 ... r the coordinates of Xt. For any time interval U C , let Fu be the set 
of all random variables of the form Xs{i) or Xs{i)Xt{j) for arbitrary s,t G U and 1 < i,j < r. Let 
f{T) > 0 be a decreasing continuous function o/T > 0 with finite integral L{f) = f{T)dT. We 
say that Xt has integrable decorrelation rate /(T) if for any disjoint time intervals U and V and 
for any random variables G G Fu and H G Fy one has 


E{GH) -E{G)E{H) 


<f{T) 


( 22 ) 


where T = min„gy^„g[/(n — u) is the time gap between U and V. 

Definition (j22j) controls np to moments of order 4 the “decorrelation” rate between Xt and Xt+u 
when u —>■ oo, and in particnlar implies that the lagged covariances matrices K{u) of the process 
Xt decay to zero at rate f{u) when u —)• oo. The converse is of course not true for non-Gaussian 
processes Xt, but when Xt is a Gaussian with lagged covariances K{u) decaying to zero at rate 
f{u), then Xt necessarily has integrable decorrelation rate proportional to /, as can be seen from 
the classical formula 

E{ZiZ2Z‘iZi) = (Ti,20'3,4 + <7i, 3'72,4 + Cr2,3'7l,4, 

where the four random variables Z^ are centered and jointly Gaussian with covariances (Tm,n- 
The decorrelation condition (I22p is also linked with classical mixing contexts. Let F{U) be the 
set of random variables measurable with respect to the sigma-algebra generated by all the Xg with 
s G U. To extend (1221) to random variables G G F{U) and R G J-'(D) it is necessary to require a 
stronger “mixing” property for Xt. Recall that when Xt is an ergodic stationary process having the 
0-mixing property (see, for instance, m for a recent survey), there is a fixed decay rate 4>{T) > 0 
tending to 0 as T —)• oo, such that for any disjoint time intervals U and V separated by a time gap 
T > 0, and for any pair of events A, B verifying 1a G F{U) and 1 _b G F{V) one has 

\P{B\A)-P{B)\<m- 

Provided the Xt are in these uniform dependency decay rates will typically imply the validity 
of our condition (f2^ for some decorrelation function /(T) deduced from 4>{T). 


4.2 Accuracy of Unobservable Sub-sampled Empirical Covariance Estimators 

We now prove L^- consistency for the unobservable sub-sampled empirical estimators of lagged 
covariance matrices. As could be expected from earlier results, integrable decorrelation rate for Xt 
plays here s a crucial role. 

Proposition 2. Let Xt G W be a stationary process with finite L^- norm v = and with 

lagged covariance matrices K{u) locally Lipschitz in u. Assume that Xt has integrable decorrelation 
rate f{T) as in (1221) and set I{f) = f{T)dT. Fix any adaptive sub-sampling scheme A(e), A(e) 
verifying (©• Let Kj^{u) be the (unobservable) sub-sampled empirical estimators of K{u) defined 
by (HI]). Then the following two statements hold. 

(I) For time lags u in any bounded interval [0,A], the estimators K^{u) are consistent in Lf as 
e —>■ 0, with uniform speed of convergence given by 


\Kx{'F) - K { u )\\2 < 7 


y/NX 


+ A 


(23) 








where the constant 7 depends only on A, 1(f), v. 

(II) As s ^ 0, the (unobservable) estimators given by (fTOll eonverge in and in I? to the 
mean // of Xt with the speed of convergence given by 

- (iv^’ *“> 

where the constant c depends only on 1(f) and v. 

Proof. The proof relies on the meticulous use of natural techniques, and is hence given in the 
appendix 1 X 1 □ 

In proposition [21 our key technical target was to identify how speeds of convergence depend 
on the adaptive scheme N(e), A(e), in order to optimize our adaptive subsampling schemes, first 
for our unobservable empirical estimators of lagged covariances (see next corollary), and further on 
for the corresponding observable covariances estimators. 


Similarity Notation. Next, we introduce similarity notation for the limiting behavior of any 
two functions depending on a small parameter e. In particular, for any two functions G(e) and 
11(e), we write G ^ H whenever lim£_).o is finite and strictly positive. 

The next proposition determines the optimal relationship between the number of observations 
N(e) and the sub-sampling time-step A(e). Smaller values of A(e) will not improve the limiting 
convergence rate, but will lead to oversampling and, thus, unnecessary data acquisition and storage. 

Proposition 3. Let Xt be a stationary process in with locally Lipschitz lagged covariances 
K(u) and integrable decorrelation rate f(T). Select arbitrary numbers of observations N(e) tending 
to 00 as e 0 and define the sub-sampling time steps by 

1 

Af(e)V3- 




The (unobservable) estimators K^(u) associated to A(e),N(e) converge to the true K(u) as s ^ 0, 
with the following iP' speed, valid for all 0 < u < A, 

||i^i(u)-K(n)|| 2 <C^^^, (26) 

where C is a constant determined by A, 1(f),n. Given the function N(e) , the iP bounds in ([26]l 
define, up to multiplicative constants, the best iP speed of convergence obtainable under the generic 
assumptions of proposition [H 

Proof. For \\K^(u) — K(u)\\ 2 , proposition [2] provides a bound proportional to 


B(e) 


1 

y/NA 


+ A. 


For any given N(e), the choice A(e) ~ N~^^^(e) obviously minimizes B(e), and one then has 
B(e) ~ N~^/‘^(£). This proves equation (12H]1 . 

To check that the bound in equation (|26l) cannot be improved further, one simply needs to 
construct a process Xt verifying all the hypotheses of proposition [21 and such that there is an 
equivalence 


kli(u) - K(u)\\2 ~ 


1 

y/NA 


+ A. 
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Therefore, in this case the choice A ~ implies the optimal sub-sampling strategy. Indeed, 

in dimension 1, given any continuous and piecewise positive definite function K{u), such that 
\K{u)\ < f{u) where f{u) is continuous, decreasing, and integrable, there is a strictly stationary 
centered Gaussian process G M with covariances K{u) (see [63]). The 2nd order moments of 
Xt must then have integrable decorrelation rate proportional to /, as seen above. For standard 
examples of stationary 1-dimensional Gaussian processes, the estimators K^{u) do achieve - 
errors of estimation which are actually equivalent to -|- A. See for instance [6l[7] where 

the case of the Ornstein-Uhlenbeck process is studied in detail. □ 

5 Accuracy of Sub-sampled Moments Estimators under Indirect 
Observability 

We now study L^-consistency for observable sub-sampled empirical estimators of lagged covariance 
matrices. The L^-speed p{e) at which Yf approximates Xt is of course application specific. One of 
our goals was to analyze how p(e) determines optimal sub-sampling schemes for collecting observable 
data. This is achieved in the following result. 

Theorem 1. Let Xt = Vrm^^QYt be an indirect observability setup in M'’ as in sectionl^ Call p 
and K{u) the mean and lagged covariance matrices of Xt, respectively. Let p{e) be the L^-speed at 
which Yf approximates Xt for all t. Let u be an upper bound of all the ||At ||4 and ||T'/|| 4 . Assume 
that Xt has integrable decorrelation rate f{T). 

For each adaptive sub-sampling scheme A(e),iV(e) verifying A(e) —)• 0 and N{e)A{e) —)• oo as 
e ^ 0, formula (Hj) defines observable sub-sampled estimators Kyiu) of K{u). For each u, let £{u) 
be the set of all such observable estimators. 

Then among all the observable estimators Kyiu) in £{u), the best achievable L'^-speed of con¬ 
vergence to the true K{u) as e —>■ 0 is equivalent to some constant multiple of p{s) and is achieved 
by any sub-sampling scheme of the form 

For any such sub-sampling scheme and any fixed A, one indeed has 

\\Ky{u) — K{u )\\2 < Cp{e) for all 0 < u < A, (28) 

\\Y^-t \\2 < Cpie), (29) 

for some constant C determined by A,u and I{f) = f{T)dT. 

Let S{e) ~ N{e)A{e) be the observational timespan gathering the time indexes nA of all the 
observables Yf^ involved in the estimator Ky fa). Each optimized sub-sampling scheme of the form 
(EZD also minimizes the rate at which S'(e) —>■ oo as e —)• 0. 

Theorem [1] will be proved right after the following more technical proposition. 

Proposition 4. Let Xt = lim£_).o Yf be an indirect observability setup in M'’ verifying all the 

assumptions of TheoremUl Let p{e) be the L'^-speed at which Yf approximates Xt for all t. Call p 

and K (u) the mean and lagged covariance matrices of Xt. Moreover, fix any adaptive sub-sampling 
scheme A{e),N{e) such that lim£^o-^(£) = oo and A{e) A-V3(e). 

By formulas Q and m, this sub-sampling scheme defines observable estimators Ky{u) and 
Y^ of K{u) and p, respectively. Then all these estimators are L'^-consistent as e ^ Q, and for any 
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fixed ^ > 0 one has the uniform L"^-speeds of convergence 


(30) 

(31) 


\\k^{u) - K{u )\\2 < ^vp{£) + ^ 2 / 3 (g) for all 0 < u < A, 

\\y - hh < p{^) + 

where c is a constant determined by A, r, I (/). 

Proof. The bound in (f3n]l is a direct consequence of the bounds obtained for \\Ky{u) — Kj^{u )\\2 
and \\K^{u) — K{u )\\2 in propositions [U and [2l Similar arguments prove the bound in (ISTI) . □ 

We can now come back to proving theorem [TJ 


Proof of theorem [T]. Taking A ~ and hence S ~ we apply proposition 01 The 

bound B{£) = 4up + in equation (I30p is larger than di/p, so that to minimize -B(e) up to 

multiplicative constants there is no asymptotic advantage in taking <C p. To simultaneously 

minimize (up to multiplicative constants) both S rsj and B{£), a natural choice is then to take 

~ p(e) which yields N ~ p~^ and hence A ~ Then, -B(e) is inferior to (4zz + c)p which 

proves the equation (l28]l . The bound on \\Y^ — p \\2 provided by (1311) is then equal to (1 + c)p which 
proves the equation (| 2 U| 1 . 

To show that the L^-speeds of convergence in (|28p and ()29p cannot be generically improved 
for observable sub-sampled covariance matrix estimators in the class £{u), consider a ID centered 
Gaussian process Xt with preassigned covariance function Kx{u) assumed to be piecewise and 
to decay at an integrable rate f{u) as u ^ oo. Next, define Yfi = Xt + p{£)Xt where /?(e) is 
any function such that lim^-^op(e) = 0- Then \\Yfi — Xt \\4 = Cp{£) and all the hypotheses of 
proposition |4] are satisfied. Moreover Yfi is a centered Gaussian process with lagged covariances 
Kx{u){l + 2p{£) -P / 0 ^(e)). Then, for any adaptive sub-sampling scheme A(e), A(e) the norm 
/i(e)^ = K[{Ky{u) — K{u))‘^] can be explicitly computed in terms of N,A,p and moments of Xf, 
and it can be checked that one always has liminfe_).o /i(e)//?(e) > 0. Since we already know that the 
optimized explicit sub-sampling scheme (|27P does yield /i(e) ~ pk), this class of specific Gaussian 
examples proves generic optimality for the announced speed of convergence. 

Previous results presented in this section have the following direct consequence for sparsely 
parametrized stationary processes. 


Theorem 2. Let Xt = lime_>.o Yf be an indirect observability setup in M'" verifying all the assump¬ 
tions of TheoremUi Let p{£) be the L'^-speed at which Yfi approximates Xt for all t. Assume also 
that Xt is sparsely parametrized by 6 & Q C as in definition\^ 

Then there exist observable estimators 6 converging in probability to 6 as e —>■ 0. After selecting 
any sub-sampling scheme A(e),A(e) of the form (l271l . one can construct these estimators by an 
expression of the form 


e" = G(T^) with 



) 


(32) 


where G : is a fixed smooth function and each is an observable sub-sampled empirical 

estimator of the lagged 2nd order moment 4>j of Xt. Moreover, if & is included in some known 
euclidean ball A of finite radius, the truncated observable estimators then converge in Lfi 

norm to 0 as £ ^ 0, with Lfi speed of convergence faster than Cp{£), for some constant C. 


Proof. Definition [T] states that 6 = G('I') where the vector T = Ti,..., Tp involves p lagged 
moments of order < 2 of the unobservable process Xt and G is a smooth function. Select an 
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optimized adaptive sub-sampling scheme of the form (1271) , and let be the associated observable 
sub-sampled empirical estimator of the lagged 2nd order moment By theorem [H as e —0, 
the estimators converge to "ifj in and hence also converge to 'hj in probability. Since 
convergence in probability is preserved by smooth functions, the estimators 6 by equation (|32p 
must then converge in probability to 0 as e to zero. For uniformly bounded random vectors, 
convergence in probability always implies convergence in L^, and this proves the last statement of 
the theorem. □ 

6 Applications to indirectly observable multi-dimensional diffu¬ 
sions 

We now present several examples of stationary multi-dimensional diffusions Xt naturally embedded 
in indirect observability frameworks. A key assumption to prove L^-consistency for natural observ¬ 
able estimators of the lagged covariances K (u) of Xt is to require the lagged moments of order < 4 
of Xt to decay at integrable decorrelation rate for large lags, as specified in equation (l22]l . 

Published literature does not provide easy generic conditions on SDEs coefficients guaranteeing 
that the associated diffusion Xt has integrable decorrelation rate as specified in (f22]l . Quite relevant 
exponential decay bounds for the transition density pe{t, x, y) as f —>■ oo have been given in [31I23II521 
[Ml, but more precise bounds on pe are needed to generically validate the integrable decorrelation 
rates on lagged moments of order 4 as required by equation (|22]l . Here we do not attempt to 
solve these technical questions for general classes of diffusions. Instead, we will simply list a few 
interesting examples of diffusions for which our assumptions can either be directly verified, or are 
quite plausibly conjectured to be true, as can be also tested by numerical simulations. 

Gradient Diffusions. In section 7.1 of |36], M. Hairer discusses the “gradient diffusions ” Xt in 
driven by the SDE 

dXt = -XQ{Xt)dt + adWt, 

where Q{x) is a smooth “potential” defined for x G i?”, and a is a constant r x r matrix. The 
potential Q is also assumed to behave as a polynomial at infinity, i.e. there are constants c, C, k 
such that 


c|x|2*^ < Q(x) < Glxp^ (x,VQ(x)) > c|x|2^ \D^Q{x)\ < , 

where V is the gradient operator and D denotes 1st order differentiation operators. Under these 
conditions, [36] proves that the probability distribution of Xt given Xq = x converges to the 
stationary probability distribution of Xt at exponentially fast speed as f —>■ oo, and that Xt verifies 
the classical Doeblin property. Results of |36] imply the exponentially fast decorrelation 

\E{GH) - E{G)E{H)\ < 

where 7 > 0 is a constant, but only for random variables G,H of the form G = f{Xs)(p{Xt) and 
H = E{Xu)'i’{X^) with bounded and s < t < t + T < u < v. Whenever the Aronson 

bounds |3] on transition densities hold, this decorrelation inequality can be extended to G = XgXt 
and H = X^X^ so that the “gradient diffusions” Xt provide a class of stationary diffusions with 
integrable decorrelation rate f{T) = and finite moments of order 4. Whenever a given 

multidimensional diffusion process Xt verifies all our assumptions on the unobservable process, the 
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simplest example of observable processes associated to Xt is generated by local smoothing , i.e. 

t 

y/ = i J Xsds. 

t—£ 

Such observable processes are often generated by sensors recording short-term averages of high 
frequency input data. See [ 6 ] for a detailed study of this case when Xt is a Gaussian diffusion. 


Volatility Processes and Heston joint SDEs. A striking example of indirect observability 
is quite ubiquitous in stochastic modeling of joint price and volatility for stockmarket data. The 
well known Heston model (see [38]) links the price St and the squared volatility Vj of an asset by 
parametrized joint SDEs of the form 

dSt = ^lStdt + ^tStdWl, 
dVt = k,{ 9 - Vt)dt + a^/VtdW2, 


where the unknown positive parameters ^,K,6,a need to be estimated from asset price data St 
only, since the squared instantaneous volatility Vj is not directly available, and plays the part of our 
unobservable process A* = Vj . In our indirect observability framework, volatility approximations 
Yf based either on prices St or on observed option prices become the observable processes. A 
classical volatility approximation is the “realized volatility” given by the sum of squared returns 


M 




k=l 


■tk - where dRt = 


(33) 


In this equation, the time step size — tk-i = £ and the window size M = M{e) are user selected. 
The L?' convergence of realized volatility to instantaneous volatility as e —>■ 0 is studied in nans]. 
In a companion paper to be published in [5|, we have proved that the pair {Yf,Vt,) verifies all 
the hypotheses of our indirect observability setup, and we have completed a detailed analysis of 
parameter estimation under indirect observability for generic Heston models (see also m)- 


Averaged Multiscale Stochastic Systems. Consider a “slow-fast” joint SDEs system (see m 
for overview and references) involving a (small) scale parameter e and given by 

dxt = a{xt,yt)dt+ b{xt,yt)dWi{t), (34) 

dyt = -c{xt,yt)dt + —d{xt,yt)dW2{t), (35) 

£ ye 

where Wi{t) and W2{t) are independent Brownian motions and the coefficients a, b, c, d are bounded 
smooth functions of x,y. Note that the diffusions xt, yt actually depend on the scale parameter 
e. Assume that for any fixed x, the “fast” SDE driving yt has a stationary distribution q{y\x) 
verifying Eg a{x, y) 7 ^ 0. Then under mild complementary conditions on a, b, c, d, and as e —)• 0, the 
process xt converges in probability to the “reduced dynamics” 

dXt = A{Xt)dt + B{Xt)B*{Xt)dWi{t), (36) 

where A = Ega(x,y) and BB* = Egbb*. Convergence in probability implies convergence 
for variables bounded in and the convergence of Xt to Xt is proved in m for periodic 
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coefficients and xt, yt on a torus. In practical applications, one essentially wants to parametrize the 
slow asymptotic SDE (|36]l driving the unobservable process Xt , and the only realistically accessible 
data are generated by the approximating process xt, since for small e the yt data are too noisy to 
be reliably acquired. Hence the slow process xt plays the role of the observable process Yf. 

There are many practical applications when A{Xt) has polynomial nonlinearities. When Xt 
is one-dimensional this trivially corresponds to the case of gradient diffusions discussed earlier in 
this section. For multi-dimensional Xt one can conjecture that the mixing rates for the process Xt 
must obey exponential decay unless equation (l36]l posesses some unusual properties (e.g. special 
symmetries or existence of conserved quantities). It is then quite reasonable to expect exponential 
convergence to the equilibrium distribution and, thus, exponentially fast decorrelation rates of 
lagged 4th moments for large lags. Moreover, exponentially fast decorrelation rates have been 
demonstrated numerically in many practical examples. So we expect that our key assumptions on 
the unobservable and observable processes will be satisfied for many multiscale examples of the 
form (1361) and (l3H) . respectively. 

7 Conclusions 

For stationary processes Xt G M'" which are not directly observable, but can be approximated in 

by observable processes Yf as e ^ 0, we have developed a mathematical framework where 
the unknown vector 0 G parametrizing Xt can be consistently and efficiently estimated from 
adequately sub-sampled observations of Yf. The present paper extends substantially several of our 
earlier results HEIE] to non-Gaussian stationary processes Xt. 

We have focused on the “sparsely parametrized” situations where 0 G R^ is a (generally non 
explicit) smooth function G('I'i,..., Tp) oip lagged moments of order < 2 of Xt. We conjecture that 
this holds true when Xt is a stationary multi-dimensional diffusion provided the matrix diffusion 
coefficients a{x, 6 ) and the drift h{x, 9 ) of Xt are analytic in x and 9 . 

The above setup leads us to study the class of parameter estimators of the form 9 = G(T^), 
where T'" is an observable empirical estimator of T based on the N{e) sub-sampled observable 
data with n = 1.. . A^(e). For parameter estimators such as 9 , analysis of consistency and 

speed of convergence is essentially equivalent to a similar but more technical analysis for observable 
subsampled estimators Kys^u) of the lagged covariances K{u) of Xt. Note that for u > 0, estimators 
Kyei^u) involve only non-vanishing time lags u{e) (with u{s) —)• u > 0 as e ^ 0), since vanishing 
time lags decrease robustness to data perturbations (see mi). 

We explicitly determine how to best choose the sub-sampling time step A(e) and the number 
N{e) of observations in terms of the distance p{e) = ||y/ —WIU (see equation (l?r|) i. Our asymp¬ 
totically optimal sub-sampling schemes N{£) ~ /5~^(e) and A(e) ~ /?(e) are constructed to simul¬ 
taneously minimize the amplitude of estimation errors as well as the computational/observational 
complexity due to both the number iV(e) and the time span S{£) = A(e)A(e) of sub-sampled 
observable data. Indeed, in many practical situations such as joint dynamic modeling of observ¬ 
able stock prices and unobservable volatilities, both N{e) and S{£) must remain rather moderate, 
even for intraday data. Our optimal sub-sampling results rely on a key hypothesis, stating that 
for s < t < u < V the random variables XgXt and X^+tX^jj^t decorrelate at an integrable rate 
/(T) —>■ 0 when T —)■ oo (see (j22h ). This is generally true for the many practical situations where 
Xt is a stationary multi-dimensional diffusion with exponentially fast mixing. 

When Xt is sparsely parametrized and has integrable decorrelation rate /, we prove that as 
e —0, the sub-sampled observable estimators of lagged covariances determined by our optimal 
sub-sampling scheme (|27h converge in to the true lagged covariances of Xt, with L^-speeds of 
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convergence faster than cp{e) for some constant c, and that the decorrelation rate / only affects the 
constant c via the integral /(/) of /. Our associated observable subsampled parameter estimators 
e =G(^^) are then consistent in probability when e —)• 0. In practical applications, the unknown 
6 is of course a priori bounded, and then a natural truncation of the estimators 0 guarantees their 
L^-convergence to 6 at L^-speed faster than some constant multiple of /o(e). 

Our work thus points out the pragmatic impact of numerical methods enabling fast evaluation of 
p(e), to help determine nearly optimal sub-sampling schemes, as well as for computing approximate 
error bars on parameter estimators. We will study numerical applications of our approach for non- 
Gaussian Xt in subsequent papers. 

Our indirect observability study has strong practical consequences for a broad range of appli¬ 
cations. Let us mention two examples. In financial mathematics, our indirect observability setup 
potentially applies to many stochastic volatility models driving the price and volatility of a given 
asset. The observable can then be a realized volatility estimated on a time window depending 
on e, and Xt is the unobservable instantaneous volatility. For the well known Heston joint SDEs, 
our approach has enabled the construction of consistent and efficient explicit parameter estimators 
based on optimally sub-sampled realized volatility data [51162]. 

A second class of examples concerns complex multiscale systems driving atmospheric or ocean 
dynamics. In this case the numerical datasets generated by known high dimensional fluid evolu¬ 
tion models can be analyzed by artificially inserting a small scaling parameter e into the model 
to further accelerate the fast variables and numerically analyze (as e varies) the behavior of pa¬ 
rameter estimators for key parameters of the slow dynamics. This computer intensive version of 
our approach should yield both a concrete and efficient optimal sub-sampling scheme as well as 
approximate error bars for our parameter estimators. We will present detailed actual examples in 
further publications. 
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A L^- consistency for unobservable estimators of lagged 2nd order 
moments 

In this appendix we present a detailed proof of theorem [2l which addresses the L^- consistency 
results for the unobservable sub-sampled empirical estimators X’^ and {u) of means and lagged 
covariances. The hypotheses and notations are those of Theorem [2j Replacing Xt by the centered 
process Xt — p and setting /r = 0 is a trivial change in the proof , so we only need to consider the 
case where all Xt are centered and /r = 0. 

Step 1. Sums of decorrelation values. For all D > 0 and j > 1 one has Df{jD) < 
f{T)dT since the decorrelation rate /(T) is decreasing. This implies 

1 ^ rj^ 

fUD) < V. ^ / f{T)dT = I{f )/D. (37) 

Define the function g{q, D) for all integers q >2 and all D > 0 by 

9-1 

9iq,D)= f{{n-m)D) = YjfUD)- (38) 

l<m<n<l+q j=l 
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Due to (l37t) . the following inequality holds for all D > 0 and q>2 

q-l 

giq, D)<{q-1)Y, fijD) < {q - l)I{f)/D. (39) 

i=i 

Step 2. Sub-sampled empirical means converge in L^. Fix an integer j £ 

Denote the j-th. coordinates of and of the empirical mean estimator X'^ by 

Un = XnA U) and X%j) = ^{Ui + ... + UN). 

With the notation s'j = E{U^), this implies 

N^E[{X^j)f]=Ns] + 2 HUmUn]. (40) 

l<m<'n<W 

Applying the decorrelation hypothesis (l22]l and the relations (f38]l . ([39]), we obtain 


^ EiUmUn) 


l<m<n<N 


< 


N 


f{{n - m)A) = g{N - 1, A) < /(/)-. 


l<m<n<N 


The definition of the L'^-norm also implies Sj < ||W ||2 < ||At ||4 = Hence, (|40ll implies 

(iiv(,)iiT<^ + ^. 

For any (ri x r 2 ) random matrix M, and any q > 1 our definition of the norm ||M||'j' implies 

\\M\\q < (rir2)^/'^max ||Mjj||,j. (41) 

The inequality ||A ^||2 < -y/rmaxj ||A'^(j)|| 2 , then yields, due to (|1T]1 . 






VA NA J 


(i.(rA)V2 + (2r/(/))V2) 


Vna 

Since A(e) —)• 0 this proves the L^-bound in (12411 when Xt is centered and hence in general. 

Step 3. Sub-sampled empirical means converge in Basic algebra yields the identities 


N^{X^{j))^= Y UaUbUmUn = Sl + S 2 + 2S3 + 2ASi, 


(42) 


where the sums S*!, S 2 , S 3 , and S '4 are defined by 

l<m<N 

S2 = Y [‘^UlU^ + UiUn + UmU^], 

l<m<n<N 

*53 = MUmUn + UaUiUn + UaUmU^], 

l<a<m<n<N 

Sa = Y UaUbUmUn. 

l<a<b<m<n<N 
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Due to the assumption that the norm of Xf is bounded uniformly by u, one clearly has lIElS'i )| < 
and |E( 52 )| < AN^v^. 

Since we are considering the centered process Xt, E{Un) = 0, and for a < m < n the decorre¬ 
lation hypothesis implies 

\E{u!UmUn)\ = \E{U^UmUn) - E{U^Um)E{Un)\ < /((n - m)A). 

Similarly, one shows that 

\E{UaUlUn)\ < f{{n - m)A) and \E{UaUmU^)\ < f{{m - a)A). 

These bounds and definition (f38ll yield (for N > 3) 

ie(S3)i < E [2/((ra - m)A) +/((m - a)A)] = 3 ^ 9 im,A) 

l<a<m<n<A^] 2<m<N—l] 

which implies, due to the bound (|3^ . 

|E(S3)I < 31{f ) E (“ - 

2<m<N-l] 

As above, one also has \E{UaUbUmUn)\ < f{{b — a)A) for a < 6 < m < n. The expressions of ^4 
and g{m, A) then yield (for A > 4) 

|E(S' 4 )| < /(( 6 -a)A)= g{m,A)= Y {X -m)g{m,A). 

l<a<b<m<n<N] 3<m<n<N 3<m<N—l 

Therefore, due to (1390 we obtain for A > 4 

|E{Si)| < E 2 

3<m<N-l 


Finally, the bounds on |E(5 a;)|, and equation (f42]l entail 


E 


{x%j)y 


^ 3Iif) 6/(/) ^ C 

- A2 A2A AA - AA 


(43) 


for some explicit constant C, since A(e) —00 and A(e) —>■ 0 with A(e)A(e) — 00 . In particular 
for £ small enough, one can clearly take C = 71(f). Therefore, equations (|41l) and (1431) imply 


II'veil ^ „l/4 „„„ II-veII ^ 

" - ij " - (AA)V4 

which proves the L‘^-bound in (1241) . 

Step 4. Convergence of empirical lagged covariance matrices estimators. Introduce 
the short-hand notations Vn = X^a and 

1 ^ 

Vn = = (44) 

n=l 

1 ^ 

tVn = rX^ = JfY^^+-^ ( 45 ) 

n=l 

1 ^ 

(46) 

n=l 
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From the definition (I12p . the covariance matrix estimators K^{u) can be rewritten as 

kj,{u) = WN-VN{TVN)*. (47) 

First, we evaluate the term Vn{tVn)* in the equation above. Impose 0 < u < A for some fixed A. 
Thus, by construction 

1 


llVv — tVatIU < 2kv/N < 2{u + A)u/N < 2(1 + A)u 
and applying the inequality (I18|) one arrives at the following relation 


NA 


WVNiTVN)* - VNiVNTh < 4(1 + Ay 


1 


NA 


Since ^ = 0, we also have 


ll^iv||4< 


C 


(A^A)V4’ 

as proven in Step 3. This implies, by inequality IfTSl) . 


||t4r(^iv)1|2 < 2 


c 


2C^ 


(fVA)V4j (ArA)V2 


which yields, due to equation 


\\vyTyrh<^ + ^^^^^. 


(48) 


VNA ' NA ' 

By the construction of k{u,s), the “discrete” lag kA is close to continuous lag u and |kA — u\ < A. 
Since the true lagged covariance matrices K{u) are locally Lipschitz, there is a constant A = A(74) 
such that for all 0 < u < A and all e > 0 the following deterministic inequality holds 


\\K{u) — K{kA)\\ < A|u — kA| < AA. (50) 

Next, we compare the term Wn in the expression of the covariance estimator (|47ll . with the 
true covariance matrix K{kA) evaluated at the “discretized ” time lag kA. Since Xt is stationary, 
we have K{kA) = E(V)iI7^_,_^) for all n, and formula (I46]l implies that 

1 ^ 

Wm - K{nA) = ^ ( VnVy, - E[K1C+J ) • 

n=\ 

For any two coordinates i, j G [1... r] denote = Vn{i) and Un = Vn{j) as the i-th and j-th 
coordinates of 14, respectively. In addition, we also define 


Hn — TnUn+K — ^[TnUn+K.]- 

Clearly E[i7„] = 0 and the (f, j) coefficient of the matrix M = Wm — K{kA) is then 


1 


N 




n=l 


and 


N^E[My = nHmHn]. (51) 

(m,n)G[l...W] 

Next, we partition the summation interval in the expression above into two complementary sets, 
and as follows 
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• {m, n) E whenever \n — m\ > k and m,n £ 

• {m, n) E Q~ whenever \n — m\ < k and m,n ^ [1...N]. 

Due to bounded fourth moments of the process Xt we have < 2z/^. Moreover, 

cardinal {Q~) = N + k{2N — k — 1) < 3Nk, 


and, therefore, 

^ E{HmHn) <6iy^NK. 

(m,n)GQ~ 

For {m,n) E Q^, the decorrelation rate of the 2nd order moments yields 


( 52 ) 


\E[H,nHn]\<fi\n-m\A), 


so that 


(m,n)&Q+ 


< X] f{\n-m\A). 
{m,n)£Q+ 


Thus, relation (f^ and inequalities (1521) . (f53]l imply 


N'^E[MIj] < f{\n - m\A) + 

{m,n)eQ+ 


(53) 


(54) 


Easy algebra transforms the double sum above into 


N 


Y f{\n-m\A) = 2 Y (N - s)f{sA) < 2N Y fisA)<2I{f)-, 


(m,n)eQ+ 


k+1<s<N-1 


l<s<N-l 


where equations (f37|) were used in the last inequality. Recall that for 0 < tt < ^ and due to the 
construction of k, one also has 

u A + 1 

K < ——h A < —-—. 

A A 

Substituting the last two expressions into (fMl) we obtain 

E[mIj] < ( 2 /(/) + 6 (^ + lY) 

By equation (I4ip we further obtain 


NA 


\\Wn - K{kA )\\2 = \\M \\2 < r (2/(/) + 6(^ + 1^) 


1/2 1 


y/NA' 


(55) 


Using the expression for K^{u) in (|47p and the triangle inequality we can write 

||Ri(ir) - K{u)\\2 < \\Wn - K{hA)\\2 + \\K{kA) - K{u)\\2 + ||Uv(rUv)1|2. 
Combining the three bounds in (HUP, (fSUp . and ([5^ . we obtain, for all e > 0 and 0 < u < A, 


\Kxiu) - K{u)\\2 < 




+ AA, 


(56) 
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where 


T = 2C^ + r (2/(/) + 6(A + 

Moreover, for e small enough, we can take = -^7r/(/) as discussed in Step 2, and 4(1 + A)i''^{NA) 
will become much smaller than Therefore, for e small enough, one has (using y/a + h < 

\/a + Vb) a simplified expression for the constant T 

r < 7 = 8y/rl{f) + 2.5z/V^ + l. 

This concludes the proof of Theorem [2l 
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